library(rtweet)
## Warning: 程辑包'rtweet'是用R版本4.2.2 来建造的
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::flatten() masks rtweet::flatten()
## ✖ dplyr::lag()     masks stats::lag()
library(tidytext)
## Warning: 程辑包'tidytext'是用R版本4.2.2 来建造的
library(ggmap)
## Warning: 程辑包'ggmap'是用R版本4.2.2 来建造的
## ℹ Google's Terms of Service: <]8;;https://mapsplatform.google.comhttps://mapsplatform.google.com]8;;>
## ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
library(ROAuth)
## Warning: 程辑包'ROAuth'是用R版本4.2.2 来建造的
library(twitteR)
## Warning: 程辑包'twitteR'是用R版本4.2.2 来建造的
## 
## 载入程辑包:'twitteR'
## 
## The following objects are masked from 'package:dplyr':
## 
##     id, location
## 
## The following object is masked from 'package:rtweet':
## 
##     lookup_statuses
library(RCurl)
## 
## 载入程辑包:'RCurl'
## 
## The following object is masked from 'package:tidyr':
## 
##     complete
library(httr)
## Warning: 程辑包'httr'是用R版本4.2.2 来建造的
library(tm)
## Warning: 程辑包'tm'是用R版本4.2.2 来建造的
## 载入需要的程辑包:NLP
## 
## 载入程辑包:'NLP'
## 
## The following object is masked from 'package:httr':
## 
##     content
## 
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(wordcloud)
## Warning: 程辑包'wordcloud'是用R版本4.2.2 来建造的
## 载入需要的程辑包:RColorBrewer
library(syuzhet)
## Warning: 程辑包'syuzhet'是用R版本4.2.2 来建造的
## 
## 载入程辑包:'syuzhet'
## 
## The following object is masked from 'package:rtweet':
## 
##     get_tokens
library(httpuv)
## Warning: 程辑包'httpuv'是用R版本4.2.2 来建造的

Introduction

https://github.com/helloiamk/RRR727

An eSIM (embedded-SIM) is a new type of programmable SIM card that is embedded directly into a device. after the Samsung Gear S2 Classic 3G smartwatch first implementing an eSIM in 2016 (Vincent 2016), many brands including Apple, Google and Microsoft have added esim support for use for their devices in the past few years. And from the perspective of consumer, e-sim give people the ability and possibility of comparing networks and selecting service at will-directly from their devices (Meukel, Schwarz, and Winter 2016). Considering the convenience of esim compared with physical sim cards, more and more people choose esim services in the past few years.

Apple announced its first iPhone models without a SIM card tray in September 2022 with the release of the iPhone 14, iPhone 14 Plus, iPhone 14 Pro, and iPhone 14 Pro Max. These versions only support eSIM and are the first iPhones to do so.(“IPhone 14 Pro and 14 Pro Max - Technical Specifications,” n.d.) After that, some criticism appeared on some social platforms, which triggers our interest in the overall opinion towards esim in the real general population. In this project, we will conduct a sentiment analysis using tweets from Twitter to figure out whether people hold positive or negative views towards esim and also investigate the top frequent words concerning about and related to esim.

Data

Data obtaining

auth_setup_default()
## Using default authentication available.
## Reading auth from 'C:\Users\kkk\AppData\Roaming/R/config/R/rtweet/default.rds'
#install.packages("httpuv")
tweets <- search_tweets("#esim", n = 500,lang="en")
#tweets
tweets.df = as.data.frame(tweets)

In this project, we conducted our sentiment analysis based on collected tweets from Twitter using ‘rtweet’ package. We searched related tweets with hashtag esim, setting the total number of collected post as 500 and limited to posts in English. Finally, we collected 314 eligible posts in total for analysis.

Data pre-processing

After gathering raw material, we conducted pre-processing of our text. We removed retweets, references to screen names, hashtags, spaces, numbers, punctuations, emojies, speical characters and urls to clean data. After that, it allows us to identify emotions from each tweet initially, which is the start of further analysis.

#clean data
tweets.df$text=gsub("&amp", "", tweets.df$text)
tweets.df$text = gsub("&amp", "", tweets.df$text)
tweets.df$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", tweets.df$text)
tweets.df$text = gsub("@\\w+", "", tweets.df$text)
tweets.df$text = gsub("[[:punct:]]", "", tweets.df$text)
tweets.df$text = gsub("[[:digit:]]", "", tweets.df$text)
tweets.df$text = gsub("http\\w+", "", tweets.df$text)
tweets.df$text = gsub("[ \t]{2,}", "", tweets.df$text)
tweets.df$text = gsub("^\\s+|\\s+$", "", tweets.df$text)

tweets.df$text <- iconv(tweets.df$text, "UTF-8", "ASCII", sub="")

After cleaning our raw text material at the first steps of pre-processing data, we got a relative clean text data. Then we need to used it to conduct the sentiment analysis. First, in this project, we used the NRC dictionary, which is a dictionary of English words and their association with eight basic emotion and two sentiments (Mohammad and Turney 2013) and widely applied in sentiment analysis, in order to identify emotions from each tweet.

# Emotions for each tweet using NRC dictionary
emotions <- get_nrc_sentiment(tweets.df$text)
## Warning: `spread_()` was deprecated in tidyr 1.2.0.
## Please use `spread()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
emo_bar = colSums(emotions)
emo_sum = data.frame(count=emo_bar, emotion=names(emo_bar))
emo_sum$emotion = factor(emo_sum$emotion, levels=emo_sum$emotion[order(emo_sum$count, decreasing = TRUE)])

Apart from identifying emotions from each tweets, we also created comparison word cloud data through pasting texts grouped by emotion categories and named it as wordcloud_tweet. Then we created our corpus for this project based on wordcloud_tweets. As the pre-process text in the first beginning of dealing with raw text from Twitter, we also conducted a series of action to clean the corpus. This series of action made it clear, rational and feasible to conduct sentiment analysis and draw more accurate conclusion. In particular, we removed punctuation and stopwords commonly for English words, converted every word in lower cases and stemming the words. Then we created document term matrix. As a note, we need to exclude words with more than 11 characters in this project so that the words can fit nicely into the wordcloud.

# Create comparison word cloud data

wordcloud_tweet = c(
  paste(tweets.df$text[emotions$anger > 0], collapse=" "),
  paste(tweets.df$text[emotions$anticipation > 0], collapse=" "),
  paste(tweets.df$text[emotions$disgust > 0], collapse=" "),
  paste(tweets.df$text[emotions$fear > 0], collapse=" "),
  paste(tweets.df$text[emotions$joy > 0], collapse=" "),
  paste(tweets.df$text[emotions$negative > 0], collapse=" "),
  paste(tweets.df$text[emotions$positive > 0], collapse=" "),
  paste(tweets.df$text[emotions$sadness > 0], collapse=" "),
  paste(tweets.df$text[emotions$surprise > 0], collapse=" "),
  paste(tweets.df$text[emotions$trust > 0], collapse=" ")
  
)
# create corpus
corpus = Corpus(VectorSource(wordcloud_tweet))

# remove punctuation, convert every word in lower case and remove stop words

corpus = tm_map(corpus, tolower)
## Warning in tm_map.SimpleCorpus(corpus, tolower): transformation drops documents
corpus = tm_map(corpus, removePunctuation)
## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation drops
## documents
corpus = tm_map(corpus, removeWords, c(stopwords("english")))
## Warning in tm_map.SimpleCorpus(corpus, removeWords, c(stopwords("english"))):
## transformation drops documents
corpus = tm_map(corpus, stemDocument)
## Warning in tm_map.SimpleCorpus(corpus, stemDocument): transformation drops
## documents
# create document term matrix

tdm = TermDocumentMatrix(corpus)

# convert as matrix
tdm = as.matrix(tdm)
tdmnew <- tdm[nchar(rownames(tdm)) < 11,]

Results

Data exploration

ts_plot(tweets) +
  theme_minimal() +
  theme(plot.title = element_text()) +
  labs(
    x = NULL, y = NULL,
    title = "Frequency of #esim Twitter statuses from past 9 days",
    subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals",
    caption = "Source: Data collected from Twitter's REST API via rtweet"
  )

First, we counted the frequencies of #esim Twitter status in the past 10 days (from 3rd Dec to 12th Dec) and aggregated the counts using three-hour intervals. Then we drew a line plot to describe and display the relationship between the frequency of posting #esim posts and time. We found that , generally the number of related posts are in a quite low volume and relatively stable at time dimension. However, on Dec 7th, especially in the afternoon, there is a sharply increase in esim related posts in Twitter. We returned to the tweets data frame to checked the content posted on Dec 7th, and founded that different users posted similar posts on the market promotion activities of newly esim plan in some mobile telecommunication companies, which can explain the reason for increase in related posts in Twitter on 7th Dec.

dtm_v <- sort(rowSums(tdm),decreasing=TRUE)
dtm_d <- data.frame(word = names(dtm_v),freq=dtm_v)
# Display the top 20 most frequent words
head(dtm_d, 20)
# Plot the most frequent words
barplot(dtm_d[2:20,]$freq, las = 2, names.arg = dtm_d[2:20,]$word,
        col ="lightblue", main ="Top frequent words",
        ylab = "Word frequencies")

set.seed(1234)
wordcloud(words = dtm_d[-1,]$word, freq = dtm_d$freq, min.freq = 5,
          max.words=100, random.order=FALSE, rot.per=0.40, 
          colors=brewer.pal(8, "Dark2"))

The figures above displayed the top frequent words in different ways. The first one is the display of the rows of top 20 most frequent words from the data frame. We also drew a barplot to show the frequencies of frequent words in a descending order. What is worth mentioning, we excluded the word “esim” from most frequent words because esim would be a stop word in this context. The result is shown in the second graph. We also created a word cloud to display the popularities of word in a clearer way. And the result is shown in the third graph.

Based on the three graphs we created above, the top 5 frequent words are plan, alreadi, data, travel and one. We can have a basic understanding that people may care about data plan of esim and they may also consider the traveling situation regarding esim. They might the connection issue on esim, as well.

This is the basic exploration on our data focusing on the frequenties in different days in a time period and top frequent words. Next part we will discuss the analysis results regarding emotions and sentiments.

Analysis

# Visualize the emotions from NRC sentiments
library(plotly)
## Warning: 程辑包'plotly'是用R版本4.2.2 来建造的
## 
## 载入程辑包:'plotly'
## The following object is masked from 'package:httr':
## 
##     config
## The following object is masked from 'package:ggmap':
## 
##     wind
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p <- plot_ly(emo_sum, x=~emotion, y=~count, type="bar", color=~emotion) %>%
  layout(xaxis=list(title=""), showlegend=FALSE,
         title="Emotion Type for hashtag")
p
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

We used NRC lexicon to categorizes words from binary fashion(yes or no) into ten different description: positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. We drew a barplot to show the counts of different sentiments and emotions in descending order. Based on the graph above,it is obviously clear that people’s attitude towards esim is very positive (197 counts), compared with negative (29 counts). The positive sentiment with the more dominant emotions are anticipation (134 counts) and trust (82 counts). Therefore, we can concluded that overall, people hold positive views towards esim.

# column name binding
colnames(tdm) = c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative', 'positive', 'sadness', 'surprise', 'trust')
colnames(tdmnew) <- colnames(tdm)

comparison.cloud(tdmnew, random.order=TRUE,
                 colors = c("#00B2FF", "red", "#FF0099", "#6600CC", "green", "orange", "blue", "brown", "black", "purple"),
                 title.size=1, max.words=250, scale=c(2.5, 0.4),rot.per=0.4)

To better reflect people’s attitudes, we continued to extend the NRC dictionary and generated word clouds based on it. And the figure above can display a general idea on the popularity of words in different emotions. We also drew a barplot to show the most frequent words in each attitude to show it in a vivid way. The plot is displayed as below.

mdata<-tdmnew[rownames(tdmnew)!="esim",]

par(mfrow=c(3,4))
for(i in 1:10){
  vect<-mdata[order(mdata[,i],decreasing = TRUE)[1:10],i]
  
  par(las=2)
  barplot(vect, 
          main=colnames(mdata)[i], 
          horiz=TRUE,
          col=rainbow(10)[i])
}

In this part, based on the high-frequency words for each attitude, we can basically assume what the main reasons for the different attitudes are. Take anticipation as an example, “travel”“data”“connect”“mobile” and “link” allows us to know that the main expectation of people comes from the link for mobile devices when traveling. As to the surprise, words”roam”“data plan” and “wireless” clearly gives us an image of people using data plans that can support wireless roaming while a trip.

Discussion

In an era of rapid technological development, tweets can be a very reliable source of information. We can use tweets as auxiliary data in social surveys, and sentiment analysis of tweets can help us effectively identify people’s attitudes towards different emerging technologies nowadays in a situation where people’s emotions are constantly changing. From a business perspective, companies can use sentiment analysis to understand how satisfied users are with their goods and thus develop good marketing strategies. From the policy perspective, government departments can understand citizens’ sentiment tendency toward popular events and grasp public opinion orientation, so as to monitor public opinion more timely and effectively, and also provide support for the formulation of relevant policies.

In this analysis, we extract sentiment words based on the text processing of the text to be analyzed according to the constructed sentiment lexicon and calculate the sentiment tendency of tweets text. And through the analysis, we can see that people’s positive emotions about esim are greater than their negative ones. And the source of the positive sentiment is mainly that people think it is an upgrade and provides more convenience for data roaming while traveling. At the same time, there are some negative sentiments about the use of wireless esim being very unfriendly to some devices that only support physical sim cards, and it is easy to see how this can cause problems for some people when traveling abroad to countries that do not support esim .

But our project still has some limitations, first of all, we can not retrieval data long ago, which leads to a missing of some relevant key timing. Second, the social media data have certain limitations, such as sampling bias and lack of demographic information(Yuan et al. 2020). We recommend calibrating sampling errors by collecting additional sources of information during practical applications. Meanwhile, it is true that computer programs have trouble identifying such things as sarcasm, irony, jokes, and hyperbole, which a person has little difficulty identifying. And not realizing that can distort the facts (Romeo, n.d.).But we would not deny that in such an era of information explosion, sentiment analysis is a very efficient way to monitor public opinion.

References

“IPhone 14 Pro and 14 Pro Max - Technical Specifications.” n.d. Apple. https://www.apple.com/iphone-14-pro/specs/.
Meukel, Markus, Markus Schwarz, and Matthias Winter. 2016. “E-SIM for Consumers—a Game Changer in Mobile Telecommunications?” McKinsey Quarterly, 1–8.
Mohammad, Saif M., and Peter D. Turney. 2013. “Crowdsourcing a Word-Emotion Association Lexicon.” Computational Intelligence 29 (3): 436–65.
Romeo. n.d. “The Benefits (and Limitations) of Online Sentiment Analysis Tools.” Typely Forum. https://typely.com/blogs/entry/7-the-benefits-and-limitations-of-online-sentiment-analysis-tools/.
Vincent, James. 2016. “Samsung’s Gear S2 Has the First Certified Esim That Lets You Choose Carriers.” The Verge. The Verge. https://www.theverge.com/2016/2/18/11044624/esim-wearable-smartwatch-samsung-gear-s2.
Yuan, Yihong, Yongmei Lu, T. Edwin Chow, Chao Ye, Abdullatif Alyaqout, and Yu Liu. 2020. “The Missing Parts from Social Media–Enabled Smart Cities: Who, Where, When, and What?” Annals of the American Association of Geographers 110 (2): 462–75. https://doi.org/10.1080/24694452.2019.1631144.